MM225 (PH XXX): AI and Data Science (2023) Autumn
M.P. Gururajan - Probability (Meta Dept)
Hina Gokhale – Statistics (Meta Dept)
Prerequisite
None
Course Content
1. Programming Basics (Python programming, basic data structures in Python, Data handling, Introduction to data file i/o, Introduction to n-d arrays (numpy), Introduction to plotting (25%) 2. Introduction to Probability (25%). Sample Spaces and events Probability axioms. Properties of Probability, Counting Techniques. Random Variables. Expectations and Variances. Visualizing PDF: Point plot, PDF, CDF, histogram, binning issues in histogram. Conditional probabilities and conditional expectation. Independence. Important discrete and continuous distributions. Bivariate distributions. Visualization of relationship between two variables: bi-variate histogram, conditional PDFs. Joint Probability distributions. Multivariate Normal Distributions with the corresponding mean vectors, variance-covariance matrices and correlation matrices. 3. Hypothesis testing (5%). Type 1 and Type 2 errors. Testing for parameters of a normal distribution and for percentages based on a single sample and based on two samples. Introduction to the chi-squared test. The concept of p-value. 4. Exploratory data analysis and data visualization: Unsupervised data exploration methods: PCA, SVD, T-SNE, etc (10%) 5. Introduction to supervised learning (25%) What is learning, learning objectives. Training, validation, and testing. General linear regression with testing hypothesis for regression coefficients and model ANOVA, Comparing the performance and tests using one way / multiple way ANOVA, Classification and regression, Neural networks, CNNs. 6. Department-specific applications (10%)
Books
Principles and Techniques of Data Science, By Sam Lau, Joey Gonzalez, andDeb Nolan, 2019, available online at https://www.textbook.ds100.org/intro● Python for data analysis, Wes Mckinney, O Reilly, 2013● CUDA by Example: An Introduction to General-Purpose GPU Programming,Jason Sanders, Nvidia, 2010● NORMAN MATLOFF. Parallel Computing for Data Science: With Examples in R,C++, and CUDA. Boca Raton: CRC Press.● Pattern Recognition and Machine Learning, by Christopher Bishop, Springer 2011● The Elements of Statistical Learning: Data Mining, Inference, and Prediction,Second Edition, by Trevor Hastie and Robert Tibshirani (Springer Series inStatistics) 2016● Dive into Deep Learning by Aston Zhang, Zack C. Lipton, Mu Li and AlexanderSmola, 2020 (https://d2l.ai)● Deep Learning, I. GoodFellow, Y. Benjio and A. Courville, MIT Press, 2017.● Introduction to Probability and Statistics for Engineers and Scientists 5th Editionby Sheldon M. Ross (Author)● Mathematics for machine learning. Mark Deisenroth et. al., Cambridge Press,2021.
Review by Anonymous
Lectures
The biggest trouble was that after just one semester of a coding course (CS101) , everyone was expected to be good enough at coding to be able to tackle the labs without issue. After a couple of labs, they would just provide the problem name and students would have to prepare what they could, given the name of the topic. The course wasn't well organized and well taught at all as it was put together in a very short amount of time. Frequent reliance on online resources for understanding course content was necessary. Content (particularly of first half sem) was nearly impossible to understand from the slides and pretty difficult to understand from the books)
Assignments, Exams and Grading
Weekly Labs (30%), two quizzes (15% each), Midsem (15%), Endsem (40% (full syllabus)). Each question carried very little individual weightage up until Endsem (where each question was worth 5% of the total, and there were 8 questions). There was no attendance. Though it must be kept in mind that there might be a significant difference between the new AI/DS course and MM225 – The professors will be different and the course won't take place in an LA with multiple departments.
Tips
"Introduction to Probability" by Grinstead and Snell (first half sem) "Introduction to Probability and Statistics for Engineers and Scientists" by Sheldon M. Ross (second half sem) Truthfully speaking, these books aren't particularly useful either. They were just the source material. Online resources(like videos), even Wikipedia, were much better most of the time.